Back

JNCI Cancer Spectrum

Oxford University Press (OUP)

Preprints posted in the last 30 days, ranked by how well they match JNCI Cancer Spectrum's content profile, based on 10 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Artificial Intelligence and Circulating microRNA Signatures for Early Breast Cancer Detection: A Systematic Review and Meta-Analysis

Solanki, s.; Solanki, N.; Prasad, J.; Prasad, R.; Harsulkar, A.

2026-03-30 oncology 10.64898/2026.03.29.26349657 medRxiv
Top 0.1%
6.8%
Show abstract

Background: Early breast cancer detection remains central to improving clinical outcomes, yet conventional screening pathways, particularly mammography, have recognized limitations in sensitivity, specificity, and performance in dense breast tissue. Circulating microRNAs (miRNAs) have emerged as promising minimally invasive biomarkers, while artificial intelligence and machine learning (AI/ML) offer powerful tools for identifying diagnostically relevant multi-marker patterns within complex biomarker datasets. This systematic review and meta-analysis evaluated the diagnostic performance of AI/ML-based circulating miRNA signatures for early breast cancer detection. Methods: A systematic search of PubMed/MEDLINE, Scopus, and Web of Science Core Collection was conducted from database inception to 31 December 2025. Studies were eligible if they were original human investigations evaluating circulating miRNAs using an AI/ML-based diagnostic model for breast cancer detection and reporting extractable diagnostic performance metrics. Study selection followed PRISMA 2020 and PRISMA-DTA guidance. Methodological quality was assessed using QUADAS 2. Pooled sensitivity and specificity were synthesized using a bivariate random-effects model, and overall diagnostic performance was summarized using a hierarchical summary receiver operating characteristic framework. Results: Seven studies met the inclusion criteria for qualitative synthesis, with eligible studies contributing to the quantitative analysis depending on data availability. Across the pooled analysis, AI/ML-based circulating miRNA models demonstrated good overall diagnostic performance, with a pooled AUC of 0.905 (95% CI: 0.890 to 0.921), pooled sensitivity of 81.3% (95% CI: 76.8% to 85.2%), and pooled specificity of 87.0% (95% CI: 82.4% to 90.7%). Heterogeneity was moderate for AUC (I2 = 42.3%) and sensitivity (I2 = 38.7%) and low for specificity (I2 = 28.4%). Risk-of-bias assessment showed overall low-to-moderate methodological concern, with patient selection representing the most variable domain. Deeks funnel plot asymmetry test showed no significant evidence of publication bias (p = 0.34). Conclusions: AI/ML based circulating miRNA signatures show promising diagnostic accuracy for early breast cancer detection and may have value as non invasive adjunctive tools within imaging supported diagnostic pathways. However, the evidence base remains limited by methodological heterogeneity, variable validation rigor, and the predominance of retrospective case control designs. Prospective, standardized, and externally validated studies are needed before routine clinical implementation can be justified.

2
The tumour microenvironment influences long-term tamoxifen benefit in postmenopausal ER+/HER2- breast cancer patients.

Camargo Romera, P.; Castresana Aguirre, M.; Danielsson, O.; Dar, H.; Ostman, A.; Czene, K.; Lindstrom, L. S.; Tobin, N. P.

2026-03-26 oncology 10.64898/2026.03.24.26349151 medRxiv
Top 0.1%
6.7%
Show abstract

BackgroundThe tumour microenvironment (TME) influences breast cancer progression and treatment response. We investigated whether TME composition predicts tamoxifen benefit in postmenopausal women with oestrogen receptor-positive, HER2-negative (ER+HER2-) breast cancer. MethodsThis study included 513 patients from the Stockholm Tamoxifen (STO-3) trial, which randomised postmenopausal, lymph node-negative women to tamoxifen or no endocrine therapy. Bulk tumour transcriptomes were deconvoluted with the ConsensusTME algorithm to estimate the relative abundance of 18 immune and stromal cell types. A summary score of combined immune cells was created on a per patient basis and evaluated alongside fibroblast and endothelial stromal compartments. Patients were categorised into immune and stromal tertiles on the basis of these scores. Associations between TME composition and tumour characteristics were evaluated using Spearman correlations and Fishers exact test. Tamoxifen benefit was analysed by univariable Kaplan-Meier (log-rank) and multivariable Cox proportional hazards adjusting for age, tumour size, grade, progesterone receptor, Ki-67, and radiotherapy. Differential expression was assessed with limma and pathway enrichment with fgsea using Hallmark gene sets from MSigDB. ResultsLow immune abundance was significantly associated with higher ER expression (Fishers exact test p < 0.001). Among tamoxifen-treated patients, those with low immune scores showed improved distant recurrence-free interval (DRFI) relative to untreated patients (log-rank p < 0.001). Similarly, intermediate endothelial (p < 0.001) and low/intermediate fibroblast abundances (p = 0.042, p = 0.009) were associated with favourable DRFI. In multivariable models, low immune (aHR = 0.17, 95% CI 0.08-0.40), intermediate endothelial (aHR = 0.21, 95% CI 0.09-0.51), and low/intermediate fibroblast tertiles (aHR = 0.50, 95% CI 0.27-0.93; aHR = 0.36, 95% CI 0.17-0.77) retained significance. Transcriptomic analysis revealed enrichment of oestrogen-response, MYC-target, and oxidative-phosphorylation pathways in low-immune and low-fibroblast tumours, while interferon-{gamma} response and allograft rejection pathways were downregulated. ConclusionsTME composition modulates tamoxifen benefit in postmenopausal ER+HER2-breast cancer. Low immune, intermediate endothelial, and low/intermediate fibroblast abundances are associated with improved benefit from tamoxifen, suggesting that both immune and stromal compartments influence endocrine treatment efficacy.

3
Five-Domain Accelerometer-Derived Behavioral Exposome and Incident Cancer Risk in UK Biobank

Ni Chan Chin (Chengqin Ni), M.; Berrio, J. A.

2026-04-12 epidemiology 10.64898/2026.04.07.26350369 medRxiv
Top 0.1%
3.7%
Show abstract

BackgroundAccelerometer-derived behavioral phenotype captures multidimensional aspects of human behavior extending well beyond physical activity, encompassing light exposure, step counts, physical activity patterns, sleep, and circadian rhythms. Whether these five domains constitute a unified behavioral architecture underlying cancer risk and whether circadian organization and light exposure confer incremental predictive value beyond movement volume alone remains to be comprehensively established. MethodsWe conducted an accelerometer-wide association study (AWAS) encompassing the complete accelerometer-derived behavioral exposome across five behavioral domains in UK Biobank participants with valid wrist accelerometry data. Incident solid cancers were designated as the primary endpoint, with prespecified site-specific solid cancers and hematological malignancy as secondary outcomes. Cox proportional hazards models with age as the timescale were used. The minimal covariate set served as the primary reporting tier, followed by sensitivity analyses additionally adjusting for adiposity/metabolic factors, independent activity patterns, shift work history, and accelerometry measurement quality. Nominal statistical significance was defined as two-sided P < 0.05 ResultsAmong 89,080 participants, 6,598 incident solid cancer events were observed over a median follow-up of 8.39 years. In the minimally adjusted model, the pan-solid-tumor association atlas was dominated by signals from activity volume, inactivity fragmentation, and circadian rhythm. Higher overall acceleration (HR per SD: 0.91, 95% CI: 0.89-0.94) and higher daily step counts (HR: 0.93, 95% CI: 0.90-0.95) were independently associated with reduced solid cancer risk, while inactivity fragmentation metrics were consistently linked to higher risk. Notably, circadian rhythms, most prominently cosinor mesor (Midline Estimating Statistic of Rhythm under cosinor model), emerged as leading inverse risk signals, underscoring the independent contribution of circadian behavioral architecture. Site-specific analyses revealed pronounced heterogeneity across tumor sites. Lung cancer exhibited a robust inverse activity-risk gradient, while breast cancer showed reproducible associations with MVPA. Most strikingly, nocturnal light exposure demonstrated a tumor-site-specific association confined to pancreatic cancer, a signal absent across all other sites examined. Associations for uterine cancer were predominantly inactivity-related and substantially attenuated following adjustment for adiposity and metabolic factors. ConclusionsAcross five accelerometer-derived behavioral domains, solid cancers as a whole were most consistently associated with a high-movement, low-fragmentation, and circadian-coherent behavioral profile. While site-specific heterogeneity exists, the broad cancer risk landscape is dominated by movement volume, inactivity fragmentation, and circadian rhythmicity. Light exposure, although more localized in its contribution, demonstrates a potentially novel and specific association with pancreatic cancer risk. These findings support a five-domain behavioral exposome framework for cancer epidemiology and, importantly, position circadian rhythm integrity and nocturnal light exposure as critically understudied dimensions warranting dedicated mechanistic investigation.

4
Autopsy-based longitudinal multi-organ high-dimensional profiling reveals lineage plasticity in TRK-inhibitor-resistant secretory breast carcinoma

Muroyama, Y.; Yanagaki, M.; Tada, H.; Ebata, A.; Ito, T.; Ono, K.; Tominaga, J.; Miyashita, M.; Suzuki, T.

2026-04-08 pathology 10.64898/2026.04.06.716668 medRxiv
Top 0.1%
3.6%
Show abstract

Secretory breast carcinoma (SBC) is typically indolent, yet mechanisms underlying aggressiveness and therapeutic resistance to tropomyosin receptor kinase inhibitors (TRKi) remain unclear. Autopsy-based longitudinal multi-organ high-dimensional profiling of metastatic TRKi-resistant SBC demonstrated histopathological heterogeneity, including secretory and squamous components, arising from a shared clonal origin. Integrated genomic and transcriptomic analyses revealed hierarchical transcriptional rewiring consistent with a lineage-plastic state, suggesting a potential link to tumor aggressiveness and therapeutic resistance.

5
Predicting 5-Year Breast Cancer Risk from Longitudinal Digital Breast Tomosynthesis: A Single-center Retrospective Study

Xu, Y.; Heacock, L.; Park, J.; Pasadyn, F. L.; Lei, Q.; Lewin, A.; Geras, K. J.; Moy, L.; Schnabel, F.; Shen, Y.

2026-03-24 radiology and imaging 10.64898/2026.03.22.26349001 medRxiv
Top 0.1%
3.6%
Show abstract

Background: Imaging-based breast cancer risk prediction models primarily use full-field digital mammography (FFDM). As digital breast tomosynthesis (DBT) has become a predominant screening modality in the United States, its potential for long-term breast cancer risk prediction remains under-explored. Objective: To develop and evaluate a deep learning model that uses longitudinal DBT exams to predict long-term breast cancer risk. Methods: This retrospective study included 313,531 DBT exams from 161,165 women (mean age, 58.5, std 11.7 years) between January 2016 and August 2020 at Institute A. A risk prediction (DRP) model was developed to estimate 2-5 year breast cancer risk using longitudinal DBT exams, patient age and breast density. Model performance was compared with a single-time point DBT model, the Mirai model using same-day FFDM, and the Tyrer-Cuzick model using the area under the receiver operating characteristic curve (AUC), time-dependent concordance index, and integrated Brier score. Results: In an independent test set (n = 34,580), the longitudinal DRP model achieved a 5-year AUC of 0.720 (95% CI, 0.703-0.738), improving on the single time point DRP model (AUC, 0.706; 95% CI, 0.687-0.724; p < 0.001) and the Mirai model (AUC, 0.687; 95% CI, 0.668-0.705; p < 0.001). In a matched case-control cohort (n=432), the DRP model achieved a 5-year AUC of 0.676 (95% CI, 0.626-0.727), compared with 0.567 (95% CI, 0.514-0.621; p < 0.001) for the Tyrer-Cuzick model. The model reclassified 37.6% (705/1,877) of women with extremely dense breasts as average risk, with a 5-year cancer incidence of 0.7% (5/705), and identified 15.5% (404/2,605) of women with fatty breasts as high risk, with a 5-year cancer incidence of 2.5% (10/404). Conclusion: A deep learning model using longitudinal DBT examinations improved long-term breast cancer risk prediction compared with FFDM-based and clinical risk models. Clinical Impacts: Longitudinal DBT-based risk prediction may enable dynamic risk assessment using screening images, supporting personalized screening strategies and more targeted use of supplemental imaging.

6
Validation of Immunoscore for Prognostic Stratification in HPV-associated Oropharyngeal Cancer: An International Multicenter Study

Nguyen, D. H.; Majdi, A.; Marliot, F.; Houtart, V.; Kirilovsky, A.; Hijazi, A.; Fredriksen, T.; de Sousa Carvalho, N.; Bach, A.- S.; Gaultier, A.- L.; Fabiano, E.; Kreps, S.; Tartour, E.; Pere, H.; Veyer, D.; Blanchard, P.; Angell, H. K.; Pages, F.; Mirghani, H.; Galon, J.

2026-04-11 oncology 10.64898/2026.04.08.26350238 medRxiv
Top 0.1%
3.6%
Show abstract

BackgroundTreatment optimization in HPV-associated oropharyngeal cancer (OPSCC) remains challenging, as recent de-escalation trials have shown limited success. Current patient selection strategies based on smoking history and TNM classification are insufficient, highlighting the need for robust, standardized prognostic biomarkers. We report the first validation of the Immunoscore (IS) for prognostic stratification in HPV-associated OPSCC. Patients and methodsWe analyzed 191 HPV-associated (p16+ and HPV DNA/RNA+) OPSCC patients from an international multicenter cohort (2015-2024), comprising a French monocentric retrospective training cohort (N = 48) and three validation cohorts: French monocentric retrospective (N = 48), French multicenter prospective (N = 50), and US multicenter retrospective (N = 45). IS is a standardized digital pathology assay quantifying CD3lJ and CD8lJ densities in tumor cores and invasive margins, with cut-offs defined in the training cohort and validated across cohorts. Associations with disease-free survival (DFS), time to recurrence (TTR) and overall survival (OS) were assessed, alongside 3RNA-seq and sequential immunofluorescence profiling of immune composition. ResultsMedian age 65; 80% male; 74% smokers; 66% T1-2; 82% N0-1 (AJCC8th). IS-High patients demonstrated superior 3-year DFS in the training and validation cohorts 1-3 (all log-rank P < 0.05). Multivariable analysis identified IS-Low as the strongest independent risk factor for DFS (HR 9.03; 95% CI: 4.02-20.31; P < 0.001). The model combining IS with clinical factors showed higher predictive accuracy for DFS (C-index 0.82) than clinical variables alone (0.7; P < 0.0001). Similar findings were observed for TTR and OS. IS-High tumors showed markedly higher enrichment of lymphoid and myeloid immune cell populations, contrasting with immune-poor signatures in IS-Low tumors. ConclusionsIS is a robust biomarker that outperforms standard clinical variables in both prognostic and predictive accuracy. The enriched cytotoxic immune infiltrate in IS-High tumors explains favorable outcomes and supports their suitability for treatment de-escalation. Prospective validation is warranted.

7
Cancer-Type Specific Prognostic Impact of Concurrent TP53 and KRAS Alterations: A Multi-Cohort Genomic Analysis

Pan, G.

2026-03-30 oncology 10.64898/2026.03.29.26349383 medRxiv
Top 0.1%
3.5%
Show abstract

Background: The tumor suppressor gene TP53 and the oncogene KRAS are among the most frequently altered core drivers in human malignancies. Although they cooperatively regulate critical biological processes, the prognostic impact of their co alterations remains poorly defined and exhibits striking inconsistency across different cancer types. Methods: We comprehensively analyzed genomic and clinical data from multi-cancer cohorts sourced from the cBioPortal database and The Cancer Genome Atlas (TCGA). Genetic alterations, including sequence variations and copy number alterations (CNAs), were classified for TP53 and KRAS. Patients were stratified into four subgroups based on individual or combined alteration status. Survival analyses were performed using Kaplan-Meier methods. Integrated multi-omics analyses were conducted to assess the relationship between genetic alterations and mRNA/protein expression, and to characterize co-occurring genetic events and their prognostic implications. Results: Patients harboring concurrent TP53 and KRAS alterations exhibited significantly shorter overall survival in pancreatic cancer, colorectal cancer, and ampullary carcinoma, but surprisingly demonstrated the longest survival in gastric cancer. Distinct KRAS mutation subtype distributions were observed across cancer types: G12D/G12V predominated in pancreatic and colorectal cancers, G12C in non small cell lung cancer, and G13D in gastric cancer, with copy number alterations representing a substantial proportion of KRAS alterations in gastric and lung cancers. Multi-omics analysis revealed a lack of concordance between genetic alterations and mRNA/protein expression, indicating that mutation status alone does not reliably reflect downstream molecular changes. Concurrent genetic events displayed striking cancer-type specificity: CDKN2A alterations frequently co-occurred with TP53/KRAS double alterations in pancreatic cancer and were associated with worse prognosis, whereas APC mutations co-occurred in colorectal cancer and correlated with improved survival. Integrated analysis further demonstrated that KRASaltered/TP53altered patients were highly enriched in pancreatic, colorectal, and lung cancers, each exhibiting unique background genomic landscapes. Conclusions: The prognostic significance of TP53 and KRAS alterations is profoundly cancer-type specific, driven by differences in mutation subtype distribution, copy number alteration patterns, co-occurring genetic events, and the discordance between genotype and functional expression. These findings challenge the simplistic view of dual-gene alterations as universal markers of poor prognosis and underscore the necessity of incorporating cancer-specific molecular contexts into prognostic models and precision oncology strategies.

8
Leveraging Large Language Models to Extract Prognostic Pathology Features in Ewing Sarcoma

Huang, J.; Batool, A.; Gu, Z.; Zhao, Z.; Yao, B.; Black, J.; Davis, J.; al-Ibraheemi, A.; DuBois, S.; Barkauskas, D.; Ramakrishnan, S.; Hall, D.; Grohar, P.; Xie, Y.; Xiao, G.; Leavey, P. J.

2026-03-19 bioinformatics 10.64898/2026.02.20.707103 medRxiv
Top 0.2%
2.1%
Show abstract

Background: Current risk stratification for Ewing sarcoma relies heavily on clinical factors such as metastatic status, failing to capture histologic heterogeneity as a potential prognostic indicator. Although pathology reports contain rich biological data, this information remains locked in unstructured narrative text, limiting large-scale retrospective analyses. We aimed to validate the utility of Large Language Models (LLMs) for scalable data abstraction and to identify prognostic histologic features from a large multi-institutional cohort. Methods: We conducted a retrospective cohort study using data from six Children's Oncology Group (COG) clinical trials. We utilized an LLM-based pipeline (OpenAI o3) to extract structured variables, including immunohistochemical (IHC) markers and CD99 staining patterns - from digitized, Optical Character Recognition (OCR)-processed pathology reports. Extraction accuracy was validated against a human-annotated ground truth (n=200) and cross-validated against senior experts (n=48). We assessed the association between extracted features and Overall Survival (OS) using Kaplan-Meier analysis and multivariable Cox proportional hazards regression, adjusting for metastatic status. Findings: We analyzed 931 diagnostic pathology reports spanning over 19-years. The LLM achieved a weighted average accuracy of 94% across 17 IHC markers; in a cross-validation subset, the LLM outperformed human annotators (weighted average accuracy over 15 IHC markers: LLM o3: 98.1%, a resident specialist 91.4%, and a senior expert 95.9%). Survival analysis identified Neuron-Specific Enolase (NSE) and S100 as significant prognostic biomarkers. After adjusting for metastatic status, NSE positivity was associated with significantly inferior survival (HR 2.15, 95% CI 1.15 - 4.02, p=0.016); this risk was most pronounced in patients with non-metastatic disease (HR 5.64, p=0.0055). Conversely, S100 positivity was associated with improved survival (HR 0.58, 95% CI 0.34-1.00, p=0.046). Interpretation: LLM-assisted extraction of pathology variables is highly accurate and scalable, capable of unlocking "dark data" from historical clinical trials. We identified NSE as a potent risk factor and S100 as a protective marker in Ewing sarcoma, particularly in localized disease. These findings suggest that AI-derived histologic data can refine risk stratification and, if validated, warrant inclusion in future prospective trials.

9
Discordance in pleural mesothelioma response classification and modelling of impact on clinical trials

Cowell, G. W.; Roche, J.; Noble, C.; Stobo, D. B.; Papanastasiou, A.; Kidd, A. C.; Tsim, S.; Blyth, K. G.

2026-03-20 oncology 10.64898/2026.03.18.26348731 medRxiv
Top 0.2%
2.1%
Show abstract

Introduction Agreement between radiologists regarding treatment response in Pleural Mesothelioma (PM) is acknowledged to be poor, but downstream effects in clinical trials have not been quantified. Methods We performed a mixed methods study, composed of a multicentre, retrospective cohort study and in silico modelling. CT images and data were retrieved from 4 UK centres regarding chemotherapy-treated patients. Expert radiologists classified response using modified Response Evaluation Criteria In Solid Tumours criteria (mRECIST) v1.1, generating discordance rate (%) and agreement. In silico modelling simulated two-arm trials of an active therapy with intended 80% power and confidence intervals for four endpoints (objective response rate (ORR), disease control rate (DCR), progression-free survival (PFS), overall survival (OS)) covering 95% of the true effect. Actual power and endpoint coverage were modelled against mRECIST misclassification rate (a single reporter equivalent of discordance rate). Consecutive simulations varied misclassification rate from 0-100% in 1% increments, each repeated 10,000 times. Results 172 cases were included. Discordance rate was 35% (60/172), kappa=0.456. In silico modelling demonstrated reduced power and endpoint precision with increasing misclassification. At 17% misclassification, corresponding to the observed 35% discordance, power dropped from 80% to 55% for ORR, 53% for DCR, 65% for PFS and 66% for OS, with endpoint coverage reduced to 88%, 89%, 92% and 92%, respectively. 50/60 (83%) discordances reflected interpretation or measurement differences intrinsic to mRECIST. Discordance was not associated with tumour volume. Conclusions Inconsistent response classification is common in PM and substantially reduces statistical power and endpoint precision in clinical trials.

10
Quantitative assessment of collagen architecture from routine histopathological images shows concordance with Second Harmonic Generation microscopy

Ingawale, V.; Dandapat, K.; Konkada Manattayil, J.; Gupta, S.; Shashidhara, L. S.; Koppiker, C.; Shah, N.; Raghunathan, V.; Kulkarni, M.

2026-04-06 pathology 10.64898/2026.03.31.26349841 medRxiv
Top 0.2%
1.9%
Show abstract

Collagen organisation within the tumour microenvironment plays a critical role in tumour progression and has emerged as an important structural biomarker in cancer. Second Harmonic Generation (SHG) microscopy enables label-free visualisation and quantitative assessment of fibrillar collagen architecture; however, its high cost, specialised instrumentation, and limited field-of-view restrict routine clinical application. In this study, we evaluated whether collagen features quantified from digitally scanned Masson-Goldners Trichrome-stained histopathological sections can approximate measurements obtained from SHG microscopy. Formalin-fixed paraffin-embedded breast tumour tissues, including benign and invasive ductal carcinoma (IDC) samples with varying collagen content, were analysed using SHG microscopy and whole-slide brightfield imaging. Matched regions of interest were analysed using two independent digital image analysis approaches: a conventional ImageJ-based workflow (TWOMBLI) and a machine learning-based computational pipeline. Collagen structural parameters including collagen deposition area, fibre number, and alignment metrics were quantified and compared across imaging modalities using correlation analysis. SHG signals were consistently detected from trichrome-stained sections, confirming compatibility of SHG imaging. Quantitative comparison demonstrated significant concordance between SHG-derived collagen metrics and those obtained from digital image analysis pipelines, particularly for collagen area and fibre alignment. These findings demonstrate that computational analysis of routine histopathological images can capture key spatial features of collagen organisation comparable to SHG microscopy. Digital pathology-based collagen quantification therefore, represents a scalable and clinically accessible approach for assessing extracellular matrix architecture in tumour tissues.

11
Cardiorespiratory fitness, polygenic risk, and breast cancer in postmenopausal women: a prospective cohort study

Tanisawa, K.; Watanabe, D.; Li, Q.; Fan, X.; Sun, X.

2026-03-19 sports medicine 10.64898/2026.03.12.26347589 medRxiv
Top 0.2%
1.8%
Show abstract

Objective: To examine the joint associations of cardiorespiratory fitness (CRF) and polygenic risk with incident breast cancer and whether higher CRF attenuates excess breast cancer risk associated with high polygenic risk in postmenopausal women. Methods: This prospective cohort study included postmenopausal women from the UK Biobank. CRF was estimated using a submaximal cycle ergometer test, and genetic susceptibility was assessed using a breast cancer polygenic risk score (PRS). Associations of CRF and PRS with incident breast cancer were examined using Cox proportional hazards models with age as the underlying time scale. Analyses were conducted overall and stratified by age (40-59 and [&ge;]60 years) and body mass index (BMI) (<25 and [&ge;]25 kg/m2). Multiplicative and additive interactions were evaluated, with additive interaction assessed using the relative excess risk due to interaction (RERI). Results: During a median follow-up of 10.7 years, 500 incident breast cancer cases were identified among 13,907 postmenopausal women. Higher CRF was associated with a lower breast cancer risk in a dose-response manner. Although multiplicative interaction was not significant, higher CRF attenuated excess risk associated with high polygenic risk on the additive scale (RERI -0.84, 95% CI -1.56 to -0.12). This attenuation was particularly evident among women aged [&ge;]60 years and those with BMI [&ge;]25 kg/m2. Conclusion: Higher CRF was associated with a lower breast cancer risk and attenuated excess breast cancer risk associated with high polygenic risk, particularly among postmenopausal women at elevated baseline risk, supporting a potential role for improving CRF in genetically informed breast cancer prevention.

12
Pregnancy Desire and Pregnancy Attempt: Why Words Matter in Reproductive Research -- A Nationwide cross-sectional Cohort Study

KABIRIAN, R.; Bas, R.; Chabassier, A.; Sebbag, C.; Rousset-Jablonski, C.; Bobrie, A.; Coussy, F.; Preau, M.; Espie, M.; Dumas, E.; Reyal, F.; Jacob, G.; Jochum, F.; Hamy Petit, A.-S.

2026-03-19 oncology 10.64898/2026.03.17.26348589 medRxiv
Top 0.2%
1.8%
Show abstract

ObjectiveTo quantify the gap between pregnancy desire and pregnancy attempts among young women with and without a history of breast cancer (BC), and to identify factors associated with this gap. DesignCross-sectional cohort study. SettingThe FEERIC study, conducted in France. PopulationWomen aged 18-43 years without or with prior BC filling inclusion forms of a collaborative study. MethodsPregnancy desire was assessed by self-report ("Do you currently desire a pregnancy?"). Attempt was defined as engaging in unprotected intercourse with the intention to conceive. The pregnancy desire-attempt gap was defined as expressing a desire for pregnancy without actively trying to conceive. Logistic regression was used to evaluate associated demographic, clinical, and treatment-related factors. Main outcome measuresPrevalence of the pregnancy desire-attempt gap and predictors of this gap among BC survivors. ResultsOf 4,351 participants (517 with BC and 3,834 controls), 735 (16.9%) reported a pregnancy desire with 54% attempting conception and 46% who did not. The desire-attempt gap was significantly more frequent in women with a history of BC (OR=1.62, 95%CI[1.15-2.30]). Among BC survivors, younger age (<30years), nulliparity, being single, and ongoing endocrine therapy were independently associated with the gap, whereas prior chemotherapy or trastuzumab were not. ConclusionsNearly half of women declaring desiring pregnancy do not initiate pregnancy attempts, with a larger gap among BC survivors. These findings highlight the need to explore both medical barriers and psychosocial determinants underlying this gap and underscore the importance of refining the language used in reproductive research. FundingThis study was supported by "SHS INCa" grant no.2016-124 and is part of a research project on young women funded by Monoprix*.

13
ExposoGraph: An Interactive Platform for Carcinogen Bioactivation and Detoxification Pathway Visualization

Pienta, K.; Kazi, J. U.

2026-03-24 bioinformatics 10.64898/2026.03.22.713456 medRxiv
Top 0.2%
1.7%
Show abstract

BackgroundDespite extensive cataloging of carcinogenic exposures by the International Agency for Research on Cancer (IARC) and pharmacogenomic variation by resources such as PharmVar and CPIC, few platforms unify exposure, metabolic activation and detoxification, DNA damage, and genetic annotation within a single interactive visualization framework. This gap limits systematic evaluation of gene-environment interactions in cancer risk assessment. MethodsWe developed the Carcino-Genomic Knowledge Graph, ExposoGraph, an interactive knowledge-graph platform for carcinogen metabolism and DNA damage pathways. The reference graph integrates curated data and annotations from IARC, KEGG, PharmVar, CPIC, CTD, and supporting literature/resources. The current reference graph contains 96 nodes across 5 entity types (Carcinogens, Enzymes, Metabolites, DNA Adducts, and Pathways) and 102 edges across 6 relationship types (activates, detoxifies, transports, forms adduct, repairs, and pathway). ResultsThe first-generation reference graph captures metabolic activation and detoxification pathways for 9 carcinogen classes spanning 15 index carcinogens. It represents 36 enzymes across Phase I activation (n=14), Phase II conjugation and detoxification (n=14), Phase III transport (n=3), and DNA repair (n=5). Interactive exploration supports carcinogen-class filtering, node- and edge-type filtering, metadata-based search, and detailed hover/detail views with provenance and pharmacogenomic annotations. The androgen branch highlights cross-pathway connectivity by linking androgen metabolism to estrogen quinone formation and DNA adduct generation through CYP19A1-mediated aromatization and downstream catechol estrogen chemistry. In the optional androgen-focused extension, additional receptor, tissue, and variant context further connects this branch to androgen receptor signaling and genotype-specific annotations. ConclusionsExposoGraph provides a first-generation integrated, interactive framework linking carcinogenic exposures to metabolic fates and genetic modulators. The platform supports hypothesis generation for gene-environment interaction studies and may inform future individualized risk modeling, while remaining a research-use framework rather than a clinically validated risk-assessment tool.

14
Accelerometer-derived circadian rhythm and colorectal cancer risk in UK Biobank: a prospective cohort study

Ni Chan Chin, M.; Berrio, J. A.

2026-04-05 oncology 10.64898/2026.04.03.26350124 medRxiv
Top 0.3%
1.7%
Show abstract

Abstract Background: While total physical activity is a recognized modifier of cancer risk, accelerometer-derived digital phenotyping enables high-resolution mapping of circadian behavior. Whether these multidimensional patterns comprising step counts, sleep, physical activity, circadian rhythmicity, and light exposure independently influence the risk of incident colorectal cancer (CRC) has not been comprehensively evaluated Methods: We performed an exposure-wide association study (ExWAS) of 224 accelerometer-derived metrics among 95,050 UK Biobank participants who were free of CRC at accelerometry. To comprehensively define circadian rhythm patterns, we systematically categorized these metrics into five core behavioral domains: step counts, sleep architecture, physical activity bouts, circadian rhythmicity, and light exposure. Hazard ratios (HRs) and 95% confidence intervals were estimated using Cox proportional hazards models with age as the underlying timescale. Results: During a median follow-up of 8.5 years, 775 participants developed CRC (503 colon; 269 rectal). In minimally adjusted models, 121 metrics showed nominal significance (31 for overall CRC, 89 for colon, and 1 for rectal cancer). Protective associations were predominantly observed for metrics characterizing activity intensity and bout structure; notably, higher mean acceleration during 5-10 minute bouts of moderate-to vigorous physical activity was associated with reduced CRC risk (HR 0.88 per SD). In contrast, no metrics within the defined sleep or light exposure domains reached nominal significance. These associations attenuated substantially following progressive adjustment for lifestyle and metabolic covariates, suggesting potential confounding or shared biological pathways. Conclusions: Our findings identified specific behavioral phenotypes within a multidimensional framework of circadian rhythm, including step counts, physical activity intensity, and bout structure, as being associated with CRC risk. However, the marked attenuation of signals after multivariable adjustment suggests these markers may not serve as independent predictors. These results underscore the complexity of multidimensional circadian digital biomarkers and necessitate independent replication to clarify their utility in cancer risk stratification.

15
Changes in Cardiorespiratory Fitness in Patients with Human Papillomavirus (HPV)-Related Oropharyngeal Cancer Undergoing Chemoradiotherapy

Burgess, M.; Thomson, J.; Fox, B.; Salaz Diaz, E.; Taylor, G. S.; Brownstein, C. G.; Iqbal, M. S.; O'Hara, J.; Sinclair, R.; Orange, S. T.

2026-04-04 oncology 10.64898/2026.04.03.26350101 medRxiv
Top 0.3%
1.7%
Show abstract

Purpose: Chemoradiotherapy (CRT) for human papillomavirus-related oropharyngeal cancer (HPV+ OPC) causes substantial treatment-related toxicity, with well-known adverse effects on quality of life (QoL), weight loss, and self-reported physical functioning. However, its impact on objectively measured cardiorespiratory fitness is unknown. This study examined changes in cardiorespiratory fitness, body composition, grip strength, and patient-reported outcomes in patients with HPV+ OPC undergoing CRT. Methods: We invited 20 patients with HPV+ OPC scheduled for CRT (age: 61.2 {+/-} 7.1 years, female: n=4) to complete assessments at three timepoints: pre-CRT (baseline), 2-weeks post-CRT, and 8-weeks post-CRT. Cardiorespiratory fitness was assessed using a maximal incremental cardiopulmonary exercise test (CPET). Body composition was estimated using segmental bioelectrical impedance analysis. QoL was assessed using the EORTC QLQ-C30 and QLQ-H&N43, and physical activity was self-reported using the International Physical Activity Questionnaire-Short Form. The primary outcome was change in oxygen consumption at the anaerobic threshold ([V]O2 at AT) measured during CPET; an objective, effort-independent marker of cardiorespiratory fitness. Results: Mean [V]O2 at AT declined from 16.0 {+/-} 3.8 ml/kg/min at baseline to 12.0 {+/-} 3.4 ml/kg/min at 2-weeks post-CRT (adjusted mean change: -4.2, 95% CI: -5.4 to -3.0 ml/kg/min) and remained low at 8-weeks post-CRT. Peak oxygen consumption ([V]O2peak: -7.4, -9.3 to -5.4 ml/kg/min), body mass (-8.5, -10.7 to -6.2 kg), fat-free mass (-6.4, -7.7 to -5.0 kg), grip strength (-4.1, -7.2 to -0.99 kg), global health status (-26.9, -39.2 to -14.6 points), fatigue (49.8, 33.7 to 65.8 points), and several disease-specific symptoms were also adversely affected at 2-weeks post-CRT and remained impaired at 8 weeks. Conclusion: This is the first study to estimate the impact of CRT on cardiopulmonary fitness in patients with HPV+ OPC. Cardiorespiratory fitness declined by ~25% following CRT and remained reduced at 8-weeks. Targeted interventions to mitigate these adverse physiological effects warrants further investigation.

16
Novel risk models based on screening history results and timing of lung cancer diagnosis: Post hoc analysis of the National Lung Cancer Screening Trial

Haddan, S.; Waqas, A.; Rasool, G.; Schabath, M. B.

2026-04-14 epidemiology 10.64898/2026.04.12.26350705 medRxiv
Top 0.3%
1.6%
Show abstract

Background: Our group previously reported that lung cancer (LC) screening history results and subsequent timing of diagnosis are associated with significant differences in survival outcomes. As a follow-up study, we sought to develop novel personalized risk models that considered screening history for incidence cancers, interval LCs, and prevalence LCs. Methods: Using data from the CT-arm of the NLST, four independent case-control analyses were conducted to develop parsimonious risk models. Controls (n=26,038) were those never diagnosed with LC. The four LC case groups were 270 prevalence LCs, 44 interval LCs, 206 screen-detected LCs (SDLCs) that had a baseline positive screen, and 164 SDLCs that had a baseline negative screen. For each case-control analysis, univariable analyses identified statistically significant covariates from 48 variables and then significant covariates were included into a stepwise backward selection approach to identify a model with the most informative covariates. Results: For prevalence LCs, the model (AUC=0.711) included age, pack-years smoked, BMI, smoking status, smoking onset age, personal history of cancer, family history of LC, alcohol consumption, and milling occupation. For interval LCs, the model (AUC=0.734) included age, smoking status, smoking onset age, cigar smoking, marital status, and asbestos occupation. For baseline positive SDLCs, the model (AUC=0.685) included age, pack-years smoked, BMI, emphysema, chemicals/plastics exposure, and milling occupation. For baseline negative SDLCs, the model (AUC=0.701) included age, pack-years smoked, BMI, smoking status, emphysema, sarcoidosis, and sandblasting occupation. Conclusions: Besides smoking and age, which are inclusion criteria for screening, these models identified other important risk factors which could be used to provide personalized LC risk assessment and screening management.

17
Time to diagnosis among children and adolescents with cancer in Quebec, Canada: a population-based study

Mullen, C.; Barr, R. D.; Strumpf, E.; El-Zein, M.; Franco, E. L.; Malagon, T.

2026-04-13 epidemiology 10.64898/2026.04.09.26350491 medRxiv
Top 0.3%
1.5%
Show abstract

BackgroundTimely cancer diagnosis in children and adolescents is critical to improving outcomes, yet substantial variation in diagnostic intervals persists across cancer types and care settings. We aimed to quantify time to diagnosis and assess variations by patient, demographic, and system-level factors. MethodsWe conducted a retrospective population-based study of children and adolescents aged 0-19 years diagnosed with one of 12 common cancers between 2010 and 2022 in Quebec, Canada. The diagnostic interval was defined as the time from first cancer-related healthcare encounter to diagnosis. We calculated medians and interquartile ranges (IQR) overall and by cancer type and used multivariable quantile regression to identify factors associated with time to diagnosis at the 25th, 50th, and 75th percentiles. ResultsAmong 2,927 individuals with cancer, diagnostic intervals varied by cancer type and age. Median intervals were longest for carcinomas (100 days; IQR 33-192) and shortest for leukemias (8 days; IQR 3-44). Compared with children living in Montreal, living in regional areas and other large urban centres was associated with longer 50th and 75th percentiles of time to diagnosis for hepatic and central nervous system (CNS) tumours. Diagnostic intervals were shorter in the post-pandemic period (2020-2022) across several cancer sites, with CNS tumours showing reductions across all quantiles. InterpretationDiagnostic timeliness differed by cancer type, age, and rurality, but not by sex, material, or social deprivation. The shorter diagnostic intervals observed in the post-pandemic period suggest that pandemic-related changes in care pathways may have expedited diagnosis for some cancers.

18
Activity of low dose nivolumab in patients with advanced squamous cell carcinomas and other cancers

Gauduchon, T.; Fayette, J.; Amini-Adle, M.; Neidhart-Berard, E.-M.; Brahmi, M.; Dufresne, A.; Dupont, M.; Coutzac, C.; De Bernardi, A.; Toussaint, P.; Mery, B.; Crumbach, L.; Ray-Coquard, I.; Dutour, A.; Castets, M.; Blay, J.-Y.; HEUDEL, P.

2026-03-27 oncology 10.64898/2026.03.25.26349285 medRxiv
Top 0.3%
1.5%
Show abstract

Immune checkpoint inhibitors such as anti-PD1 antibodies are essential in cancer therapy. Emerging data suggest that lower doses may be effective and more economical, though further evidence is needed. We conducted a retrospective study at Centre Leon Berard to assess the efficacy and safety of low-dose nivolumab (20 mg every three weeks) in patients with advanced cancer, mainly squamous cell carcinomas (SCC). Between 2023 and 2024, 53 patients were treated, with a median age of 74 years; 39.6% were over 80. Most were male (64%) and had ECOG >1 (69.9%). Primary tumor sites included cutaneous SCC (34%), head and neck SCC (32%), and soft tissue sarcoma (15%). After a median follow-up of 8.3 months, median overall survival was 7.5 months. The objective response rate (ORR) was 20.8% overall, rising to 35.3% in cutaneous SCC and 23.5% in head and neck SCC-comparable to standard-dose nivolumab. Toxicity was manageable: 18.7% experienced immune-related adverse events, with only 3.7% grade 3. Low-dose nivolumab demonstrates encouraging efficacy and tolerability in a frail population, supporting its potential role in resource-limited settings. Prospective trials are warranted to confirm these findings in broader populations.

19
Prospective Population-Scale Validation of an Electronic Health Record Based Model for Pancreatic Cancer Risk

Lahtinen, E.; Schigiltchoff, N.; Jia, K.; Kundrot, S.; Palchuk, M. B.; Warnick, J.; Chan, L.; Shigiltchoff, N.; Sawhney, M. S.; Rinard, M.; Appelbaum, L.

2026-04-13 oncology 10.64898/2026.04.11.26350318 medRxiv
Top 0.3%
1.3%
Show abstract

Background and aims: Pancreatic ductal adenocarcinoma (PDAC) surveillance is limited to individuals with familial or genetic risk although most future cases arise outside these groups. In a retrospective study, PRISM, an electronic health record (EHR)-based PDAC risk model, identified individuals in the general population at elevated near-term risk of PDAC. We aimed to prospectively evaluate whether PRISM can identify high-risk individuals beyond current surveillance groups across U.S. health systems. Methods: We performed a prospective multicenter cohort study after deployment of PRISM in April 2023 across 44 U.S. health care organizations. Eligible adults aged [&ge;]40 years without prior PDAC received a single baseline risk score and were assigned to prespecified risk tiers. Patients were followed for incident PDAC for 30 months. We estimated tier-specific 30-month cumulative incidence (positive predictive value, PPV), number needed to screen (NNS), standardized incidence ratios (SIRs), and time from deployment and first high-risk flag to diagnosis. Results: Among 6,282,123 adults assigned a PRISM score, 5,058,067 had follow-up; 3,609 developed PDAC. The highest-risk tier had 30-fold higher PDAC incidence than the study population. At the SIR 5 threshold, 30-month cumulative incidence was 0.35% (NNS, 284.2); at SIR 16, 1.14% (NNS, 87.4); and at SIR 30, 2.19% (NNS, 45.7). Median time from deployment to PDAC diagnosis was 9.5 months, and median time from first high-risk flag to diagnosis at SIR 5 was 3.5 years. Shapley additive explanations (SHAP) analyses supported patient- and tier-level interpretability. Conclusions: Prospective deployment of PRISM across multiple U.S. health care organizations identified individuals at elevated near-term risk for PDAC, with substantial risk enrichment and lead time before diagnosis. These findings support the real-world scalability and generalizability of EHRbased risk stratification for risk-adapted early detection. ClinicalTrials.gov identifier NCT05973331

20
A Transformer-Based 2.5D Deep Learning Model for Preoperative Prediction of Lymph Node Metastasis in Papillary Thyroid Carcinoma

Xu, S.; Yan, X.; Su, Y.; Qi, J.; Chen, X.; Li, Y.; Xiong, H.; Jiang, J.; Wei, Z.; Chen, Z.; YALIKUN, Y.; Li, H.; Li, X.; Xi, Y.; Li, W.; Li, X.; Du, Y.

2026-04-02 oncology 10.64898/2026.04.01.26349933 medRxiv
Top 0.3%
1.3%
Show abstract

Background: Accurate preoperative prediction of lymph node metastasis (LNM) in papillary thyroid carcinoma (PTC) remains challenging, particularly in clinically node-negative (cN0) patients, leading to potential overtreatment. We aimed to develop and validate a Transformer-based 2.5D deep learning model (ThyLNT) using preoperative computed tomography (CT) images for robust prediction of LNM and to explore its underlying biological basis through multi-omics analyses. Methods: A total of 1,560 PTC patients from six hospitals were retrospectively included. The Tongji Hospital cohort (n=1,010) was divided into training (70%) and internal validation (30%) sets, while five independent institutions served as external test cohorts. For each lesion, seven 2.5D slices were extracted and modeled using a DenseNet201 backbone. Slice-level features were integrated using a Transformer-based feature-level fusion strategy and compared with ensemble learning, multi-instance learning (MIL), and traditional radiomics approaches. Model performance was assessed using area under the receiver operating characteristic curve (AUC), calibration analysis, decision curve analysis (DCA), and precision-recall curves. Multi-omics analyses, including bulk RNA-seq, single-cell RNA-seq, spatial transcriptomics, and spatial metabolomics, were performed to investigate biological correlates. Results: The Transformer-based model consistently outperformed comparator models across cohorts. In the training and validation cohorts, ThyLNT achieved AUCs of 0.882 and 0.787, respectively, with external AUCs ranging from 0.772 to 0.827. Compared with ultrasound (US) and CT, ThyLNT showed superior predictive performance (all P < 0.001 in the validation cohort). Simulation analysis in cN0 patients suggested that ThyLNT could reduce unnecessary lymph node dissection (LND) from 52.16% to 4.88%. Transcriptomic analysis combined with WGCNA and correlation analysis identified VEGFA as the gene most strongly associated with ThyLNT prediction scores. Single-cell and spatial transcriptomic analyses suggested metastasis-related tumor microenvironment remodeling, while enrichment analysis of genes affected by virtual knockout of VEGFA indicated involvement of angiogenesis- and epithelial-mesenchymal transition (EMT)-related pathways. Spatial metabolomics further revealed coordinated lipid metabolic reprogramming in metastatic tissues. These findings suggest that ThyLNT provides robust predictive performance while capturing biologically relevant features associated with metastatic progression.